99 research outputs found
Lateral gene transfer, rearrangement, reconciliation
Background.
Models of ancestral gene order reconstruction have progressively integrated different evolutionary patterns and processes such as unequal gene content, gene duplications, and implicitly sequence evolution via reconciled gene trees. These models have so far ignored lateral gene transfer, even though in unicellular organisms it can have an important confounding effect, and can be a rich source of information on the function of genes through the detection of transfers of clusters of genes.
Result.
We report an algorithm together with its implementation, DeCoLT, that reconstructs ancestral genome organization based on reconciled gene trees which summarize information on sequence evolution, gene origination, duplication, loss, and lateral transfer. DeCoLT optimizes in polynomial time on the number of rearrangements, computed as the number of gains and breakages of adjacencies between pairs of genes. We apply DeCoLT to 1099 gene families from 36 cyanobacteria genomes.
Conclusion.
DeCoLT is able to reconstruct adjacencies in 35 ancestral bacterial genomes with a thousand gene families in a few hours, and detects clusters of co-transferred genes. DeCoLT may also be used with any relationship between genes instead of adjacencies, to reconstruct ancestral interactions, functions or complexes.
Availability.
http://pbil.univ-lyon1.fr/software/DeCoLT
Hidden breakpoints in genome alignments
During the course of evolution, an organism's genome can undergo changes that
affect the large-scale structure of the genome. These changes include gene
gain, loss, duplication, chromosome fusion, fission, and rearrangement. When
gene gain and loss occurs in addition to other types of rearrangement,
breakpoints of rearrangement can exist that are only detectable by comparison
of three or more genomes. An arbitrarily large number of these "hidden"
breakpoints can exist among genomes that exhibit no rearrangements in pairwise
comparisons.
We present an extension of the multichromosomal breakpoint median problem to
genomes that have undergone gene gain and loss. We then demonstrate that the
median distance among three genomes can be used to calculate a lower bound on
the number of hidden breakpoints present. We provide an implementation of this
calculation including the median distance, along with some practical
improvements on the time complexity of the underlying algorithm.
We apply our approach to measure the abundance of hidden breakpoints in
simulated data sets under a wide range of evolutionary scenarios. We
demonstrate that in simulations the hidden breakpoint counts depend strongly on
relative rates of inversion and gene gain/loss. Finally we apply current
multiple genome aligners to the simulated genomes, and show that all aligners
introduce a high degree of error in hidden breakpoint counts, and that this
error grows with evolutionary distance in the simulation. Our results suggest
that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure
On pairwise distances and median score of three genomes under DCJ
In comparative genomics, the rearrangement distance between two genomes
(equal the minimal number of genome rearrangements required to transform them
into a single genome) is often used for measuring their evolutionary
remoteness. Generalization of this measure to three genomes is known as the
median score (while a resulting genome is called median genome). In contrast to
the rearrangement distance between two genomes which can be computed in linear
time, computing the median score for three genomes is NP-hard. This inspires a
quest for simpler and faster approximations for the median score, the most
natural of which appears to be the halved sum of pairwise distances which in
fact represents a lower bound for the median score.
In this work, we study relationship and interplay of pairwise distances
between three genomes and their median score under the model of
Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a
rearrangement may change the sum of pairwise distances by at most 2 (and thus
change the lower bound by at most 1), even the most "powerful" rearrangements
in this respect that increase the lower bound by 1 (by moving one genome
farther away from each of the other two genomes), which we call strong, do not
necessarily affect the median score. This observation implies that the two
measures are not as well-correlated as one's intuition may suggest.
We further prove that the median score attains the lower bound exactly on the
triples of genomes that can be obtained from a single genome with strong
rearrangements. While the sum of pairwise distances with the factor 2/3
represents an upper bound for the median score, its tightness remains unclear.
Nonetheless, we show that the difference of the median score and its lower
bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on
Comparative Genomics (RECOMB-CG), 2012. (to appear
Cassis: detection of genomic rearrangement breakpoints
Summary: Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints
A Unifying Model of Genome Evolution Under Parsimony
We present a data structure called a history graph that offers a practical
basis for the analysis of genome evolution. It conceptually simplifies the
study of parsimonious evolutionary histories by representing both substitutions
and double cut and join (DCJ) rearrangements in the presence of duplications.
The problem of constructing parsimonious history graphs thus subsumes related
maximum parsimony problems in the fields of phylogenetic reconstruction and
genome rearrangement. We show that tractable functions can be used to define
upper and lower bounds on the minimum number of substitutions and DCJ
rearrangements needed to explain any history graph. These bounds become tight
for a special type of unambiguous history graph called an ancestral variation
graph (AVG), which constrains in its combinatorial structure the number of
operations required. We finally demonstrate that for a given history graph ,
a finite set of AVGs describe all parsimonious interpretations of , and this
set can be explored with a few sampling moves.Comment: 52 pages, 24 figure
On the PATHGROUPS approach to rapid small phylogeny
We present a data structure enabling rapid heuristic solution to the ancestral genome reconstruction problem for given phylogenies under genomic rearrangement metrics. The efficiency of the greedy algorithm is due to fast updating of the structure during run time and a simple priority scheme for choosing the next step. Since accuracy deteriorates for sets of highly divergent genomes, we investigate strategies for improving accuracy and expanding the range of data sets where accurate reconstructions can be expected. This includes a more refined priority system, and a two-step look-ahead, as well as iterative local improvements based on a the median version of the problem, incorporating simulated annealing. We apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny
Sampling and counting genome rearrangement scenarios
Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring
Multichromosomal median and halving problems under different genomic distances
<p>Abstract</p> <p>Background</p> <p>Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances.</p> <p>Results</p> <p>We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems.</p> <p>Conclusion</p> <p>This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.</p
- âŠ